Character Data in Programming Languages

Inside Macintosh: Programming With the Text Encoding Conversion Manager /: Appendix B - Character Encodings Concepts

Character Data in Programming Languages
The C char type is supposed to be large enough to store any member of the execution character set. If a genuine character from that set is stored in a char object, its value is equivalent to the integer code for the character and is non-negative. The char type is also equivalent to a single byte and may be signed or unsigned (implementation dependent).
C does not actually define the size of a byte, so in principle a byte could be made large enough so a char would accommodate multi-octet characters and Unicode characters. However, in most implementations, bytes and char objects are 8 bits, and multi-octet characters require a sequence of char objects.
Instead, C provides the wide character or wchar_t type. This is really supposed to be large enough to hold the largest character in any extended execution set supported by the implementation ( including MBCS encodings). It permits internal processing using fixed-size characters; C library functions such as mbstowcs( ) and wcstombs() convert between SBCS/MBCS strings and wide character strings. However, the size of wchar_t is implementation specific; although it is usually 16 or 32 bits, on some implementations it is equivalent to char.
Java takes a different approach: Bytes remain 8 bits, but a Java char is a 16-bit unit intended to contain a Unicode character.
Finally, programming languages generally provide some abstraction away from encoding details. For example, the C character constant 'A' may have the value 0x41 for an ASCII-based implementation, but 0xC1 for an EBCDIC-based implementation. Nevertheless, programs may make more subtle assumptions about character encodings, such as assuming that A-Z have sequential contiguous code points (not true in EBCDIC).